Mon. Not. R. Astron. Soc. 000, 000-000 (0000) Printed 28 June 2012 (MN KTbX style file v2.2) 



Multiscale Inference of Matter Fields and Baryon Acoustic 
Oscillations from the Lya Forest 



Francisco-Shu Kitaura 12 *, Simona Gallerani 3 and Andrea Ferrara 1 

1 SNS, Scuola Normale Superiore di Pisa, Piazza dei Cavalieri, 7 - 56126 Pisa — Italy 

2 LMU, Ludwig-Maximilians Universitat Miinchen, Scheinerstr. I - D-81679 Munich — Germany 

3 INAF-Ossen'atorio Astronomic/) di Roma, via di Frascati, 33 - 00040 Monteporz.io Catone — Italy 



(N 
(N 



O 

u 

6 



> 

(N 



28 June 2012 



ABSTRACT 

We present a novel Bayesian method for the joint reconstruction of cosmological matter den- 
sity fields, peculiar velocities and power-spectra in the quasi-nonlinear regime. We study its 
applicability to the Lya forest based on multiple quasar absorption spectra. Our approach to 
this problem includes a multiscale, nonlinear, two-step scheme since the statistics describing 
the matter distribution depends on scale, being strongly non-Gaussian on small scales (< 0.1 
h^ 1 Mpc) and closely lognormal on scales > 10 h^ 1 Mpc. The first step consists on perform- 
ing ID highly resolved matter density reconstructions along the line-of-sight towards z ^2-3 
quasars based on an arbitrary non-Gaussian univariate model for matter statistics. The second 
step consists on Gibbs-sampling based on conditional PDFs. The matter density field is sam- 
pled in real-space with Hamiltonian-sampling using the Poisson/Gamma-lognormal model, 
while redshift distortions are corrected with linear Lagrangian perturbation theory. The power- 
spectrum of the lognormal transformed variable which is Gaussian distributed (and thus close 
to the linear regime) can consistently be sampled with the inverse Gamma distribution func- 
tion. We test our method through numerical N-body simulations with a computational vol- 
ume large enough (> 1 h~ 3 Gpc 3 ) to show that the linear power-spectra are nicely recovered 
over scales larger than > 20 hr l Mpc, i.e. the relevant range where features imprinted by the 
baryon-acoustics oscillations (BAOs) appear. 

Key words: (cosmology:) large-scale structure of Universe - methods: data analysis - meth- 
ods: statistical - quasars: absorption lines 



1 INTRODUCTION 

In the current cosmological picture, structures in the Universe have 
grown from tiny fluctuations through gravitational clustering. Non- 
linear processes of structure formation destroy the information 
about the origin of our Universe. This information is, however, still 
encoded in the large scales in which structures are close to the lin- 
ear regime. In particular baryon-photon plasma oscillations can be 
detected in the large-scale structure (LSS) as remnants of the early 
Universe. Their characteristic scale is measurable as an oscillatory 
patter n in the matter power-spectrum (see e. g. lEisenstein & HvJ 
1998) which evolves in time in such a way that they can be used 
as st andard rulers and to constrain cosmological parameters (see 
e. g. [Eisenstein 2007) . They can also be used to study dark energy 



Background (see e. g. lHinshawet"aT]l2003h. but also from galax 



evolution (see e. g. Wang 2006). For these reasons it is especially 



interesting to have measurements of baryon-acoustic oscillations 
(BAOs) at different redshifts and with different matter tracers con- 
firming the same underlying physics. Therefore many efforts have 
been made to detect BAOs not only from the Cosmic Microwave 
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redsh i ft surveys at low red shifts (see lEisenstein et alj|2005l : iHuts 



axy 
iitsi 



2006; Percival e t al J 120071) . and even from photometric redshifts 
(see lBlake et al.l l2007). Moreover, using the luminous distribution 
of galaxies as large-scale tracers becomes extremely expensive as 
larger volumes need to be surveyed. 

The neutral hydrogen of the intergalactic medium (IGM) rep- 
resents an appealing alternative as pointed out by several groups 
(see e. g. iMcDonald & EisenstehJ [20071 ; ISlosar et alj|2009h . The 
detection of BAOs from the indirect measurements of the large- 
scale structure of the IGM would provide a complementary study 
exploiting a completely different matter tracer. Moreover, it would 
scan a different redshift range at which structures are closer to lin- 
ear regime and thus less information loss on the cosmic initial con- 
ditions has occurred. However, many complications arise when try- 
ing to perform such a study. Extremely luminous sources (usually 
quasars, but more recently also Gamma Ray Bursts) are required 
to shine through the IGM as distant lighthouses to allow the detec- 
tion of its absorption features. The ultraviolet radiation emitted by a 
quasar suffers resonant Lya scattering as it propagates through the 
intergalactic neutral hydrogen. In this process, photons are removed 
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from the line-of-sight resulting in an attenuation of the source flux, 
the so-called Gunn-Peterson effect. The measurement of multiple 
quasar absorption sight-line spectra traces the LSS. 

The first problem consists on translating the observed flux of 
a quasar absorption spectrum into the underlying density field. The 
explicit flux-density relationship is complex. One of the pioneer- 
ing w orks to invert this relation was done by iNusser & Haehnelj 
(1999). Here an explicit relation between the HI optical depth, 
which is related to the observed flux, and the matter fiel d is inverted 
using the Richard son-Lucy deconvolution algorithm ( Richardson] 
ll972l;lLucvl[T974h . In this approach a particular equation of state 
for the IGM has to be assumed which implies the knowledge of 
the gas thermal and ionization history. To alleviate these problems, 
we propose to follow the works of loc al mapping meth ods pro- 
posed to Gaussianize cosmic fields (see IWeinberd 1 19920 and ac- 
tuall y applied to power-spe ctrum estimation from the Lya forest 
(see lCroft et al.ll 1998L 1 1999h . We note that this approach is tightly 
related to the bia si ng studies from galax y surveys proposed by 
ISigad et alj d200d) ; ISzapudi & Par] d2004t) . In particular we rely 
on the recently developed ID technique that recovers the nonlin- 
ear density field from Lya data without information on the equa- 
tion of state, the therm al history or the ionization level of the IGM 
dGallerani et alj|201 ll) . The strong correlation we found between 
the flux and the matter density enables one to establish a statistical 
one-to-one relation between the probability density of the flux and 
the matter one. This approach reduces all the assumptions to the 
knowledge of the matter statistics which is well constrained by N- 
body simulations. It permits us to deal with strongly skewed matter 
probability distribution functions (PDFs) which apply at very small 
scales (< 1 hT 1 Mpc) corresponding to the Jeans scale of the IGM. 
This method is especially well suited for the large-scale structure 
analysis as it makes a minimum number of assumptions and is 
computationally very efficient. As we will show the main sources 
of uncertainties in the density field along the line-of-sight with our 
approach will come from the peculiar motions neglecting errors in 
the determination of the continuum flux. 

The second problem affecting power-spectrum extraction 
from a set of multiple quasar sight lines comes from the win- 
dow function. A number of well established techniques perform- 



(see e. g. 


Feldman et all Il994 


; iTesmarkll 19951: iHamiltonll 19971: 


lYamamotol 


20031; Percivaletal 


2004; PercivalH2005l). The main 



problem which arises when doing such a study comes from the 
aliasing introduced by the mask and selection effects of the survey. 
Recently, great effort has been put in designing surveys which have 
well behaved mas ks to minimize thes e effects, as the 2dF Galaxy 
Redshift Surv eJI dColless et al.ll2003l) and the Sloan Digital Sky 
Survey (SPSS n dAbazaiian et al.ll2009l) . This strategy is very lim- 
ited in the case of the Lyman alpha forest. The nature of the observ- 
able, namely multiple line-of-sight quasar spectra, produces a com- 
plex 3D completeness with many unobserved regions in-between 
the spectra. This effect is unavoidable as qu asars are spa r sely d is- 
tributed in space. An alternative proposed by ISlosar et al.1 J2009) is 
to work with cross-power spectra. 

Here we propose to extend the Bayesia n nonlinear 
Poisso n/Gamma-lognormal models developed in iKitaura et al.l 
(2010) together with the Gibbs and Hamiltonian sampling tech- 
nique as presented bv lKitaura &~En filin (2008D; |jasche et alJ fcoiO) 



1 http ://www. mso.anu.edu. au/2dFGRS/ 
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and lJasche & Kitaural J20T0I) to measure power-spectra and detect 
BAOs. This approach enables us to measure the features in three 
dimensional power- spectrum sampling over a large ensemble of 
density realizations and power-spectra. While the concept of data 
augm entation with constrained r e alizations is clear in the Gaussian 
case (Hoffman & Ribak| 1 199 lb | van de Weygaert & Bertschingerl 
1996), it becomes very complex for nonlinear density distributions 
resulting only in an approx imate procedu re in most o f the c ases 
( iBistolas & Hoffman|[l998h . The work of lPichon etall d200ll) re- 
lie s on a nonlinear least squ ares approach reconstruction method 
bv lTarantola & Valettelfl982h which is based on Gaussian distribu- 
tion functions including a nonlinear transformation. This method 
has been shown to be adequate to stud y the topology of the IGM 
dPichon et alfeOOlblCaucci et alj|2008h . 

The Hamiltonian sampling technique is more general and 
enables one t o sample nonlinear distribution functions in an ex- 
act way (see iDuane et alj 1 1981 iNeall 1 19931 : iTavlor et all 120081: 
Ijasche & Kitaurdl2010h . The data augmentation step is built-in in 
this method as the posterior distribution function of the matter field 
given the data is fully sampled. However, as we will show here 
the power-spectrum cannot be extracted in a straightforward way 
solely with the Hamiltonian sampling scheme. 

In this work we propose a simple and efficient approach to 
jointly sample power-spectra and non-Gaussian density fields. The 
idea consists on using the Gaussian prior for the matter field and 
encoding the transformation of the non-Gaussian density field into 
its linear-Gaussian component in the likelihood. The advantage 
of this approach is that we can model the PDF of the power- 
spectrum with the inverse Gamma distribut ion in a consistent wa y 
under the Gaussian prior assumption (see lKitaura & Enfilinll2008h . 
We rely on the lognormal transformation to relate the nonlinear 
density field and its linear count er-part as it has been done in 
other works to model the IGM (see lViel et alj2002l : lGallerani et all 
2006). Also note the recent works on the lognormal transfo r matio n 
and its relation to t he lin ear field by [Neyrinck et al. ( 2009, 1 201 ll) ; 
IKitaura & Anguld ( 1201 lh . As we know how to sample the density 
field and the power- spectrum conditioned on e ach other we can 
apply t he Gibbs-sampling scheme proposed in IKitaura & EnBlin 
(2008): |jasc"heet all 12010) extended here to non-Gaussian density 
fields. 

We show also how to solve within this framework for red- 
shift distortions in the quasi- nonlinear regime relying on linear La- 
grangian perturbation theory dZerdovichll 1970l : McGill 19905). 

Finally we validate our method with numerical tests based on 
a large N-body simulation at red shift z = 3 of 1.34 h ~ x Gpc side 
length which was performed by Ang ulo et all fe008l) to study the 
detectability of BAOs at different redshifts. 

The rest of the paper is structured as follows. In the next sec- 
tion (i|2j we summarize the multiscale approach; in Sj3]we present 
the reconstruction along the quasar sight lines and estimate the un- 
certainties including biasing, thermal broadening and peculiar mo- 
tions. In ij4]we present the Bayesian method to jointly recover the 
large-scale structure and its power- spectrum (with the BAO signal) 
correcting for redshift distortions based on a combination of the 
Gibbs and Hamiltonian sampling techniques. The numerical valida- 
tion experiments are presented in section i]4.2| Finally, we present 
our summary and conclusions. 
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2 MULTISCALE STATISTICAL INFERENCE 

The matter distribution changes as cosmic evolution triggers 
structure formation. The fluctuations of the Universe which are 
closely normal distributed in the early epochs start to become 
non-Gaussian through gravitational clustering at small scales. For 
the quasi-nonlinear regime the lognormal distribution function is 
known to give a good description. 

In this approximation a nonlinear transformation of the over- 
density field le ads to a quantity wh ich is assumed to be Gaus- 
sian distributed dColes& Joneslll99lh . This will be crucial for our 
power-spectrum sampling method (see []4j. 

This distribution can be considered to be valid at scales > 10 
h^ 1 Mpc for the redshift range (z ~ 2 — 3) we are interested in. 
However, on scales smaller than ~0. 1 h^ 1 Mpc the matter statis- 
tics shows a st r onger skewness than lognor mal (see the works by 
IColombi|[l994l : iMiralda-Escude e t al. 2000). The lognormal prior 
can also be used for the intermediate scales 1 — 10 h^ 1 Mpc, as it 
fits very well the positive tail of the overde nsity statistics, but it is 
not accurate in the underdense regions (see lKitaura et alj|2010h . 

Since the Jean's scale of the intergalactic gas as obtained 
by flux measureme nts of the Lya forest is < 1 h" 1 Mpc 
dGnedin & Hui|[l998h . an accurate reconstruction technique of the 
matter field along the spectra requires a precise treatment of the 
matter statistics on small scales. However, the baryon acoustic os- 
cillations are washed out on small scales and become increasingly 
prominent towards large scales (scales larger than 10 h^ 1 Mpc). 
We are thus dealing with two different problems which are defined 
on different scales. 

An approach trying to directly solve the full three dimensional 
problem would become either extremely complex or require strong 
approximations. The multivariate matter PDF can be modeled in 
the highly non-Gaussian regime by expanding the lognormal distri- 
bution with the multivariate Edgewort h expansio n lKitaural [2010]) 
in analogy to the univariate case (see IColombi|[l99ir However, 
such an expansion turns out to be extremely expensive and requires 
models for the multivariate higher order correlation functions (es- 
pecially the three-point and the four-point statistics to model skew- 
ness and kurtosis), introducing hereby additional parameters. 

Instead we propose in this work to radically simplify the prob- 
lem by splitting it into two characteristic scales: first the scale of 
the resolution of the data and second the scale of the minimum re- 
quired resolution to study the physical problem of interest, in this 
case BAOs. 

In the first step we propose to apply a fast and efficient ID 
reconstruction method along the line-of-sight (los) spectra. Note, 
that any (sufficiently fast) reconstruction method along the quasar 
los could be adopted (e.g. iNusser & Haehneltll 19991) . We consider, 
however, a method which we have recently develo ped as especially 
adequate for this work (see iGallerani et alj 1201 lh following the 
works of lo cal mapping methods proposed to Gauss ianize cosmic 
fields (see IWeinberg 1992!: ICroft etai]|l998l [l999l) . Our method 
is flexible to use an arbitrary univariate matter distribution model 
for scales < 0. 1 Mpc. The method avoids any assumption on the 
thermal or ionization histories of the IGM which may cumulatively 
affect the power on large scales and corrupt the BAOs analysis. It 
effectively includes all the physics linking the dark matter field to 
the observed flux, with the exception of peculiar motions, assuming 
the matter statistics to be known. 

In the second step we propose to use the set of reconstructed 
density los spectra sorted on a lower resolution grid to recover the 
large-scale structure and measure the BAO signal (see flowchart 
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Figure 1. Flowchart of the mul tiscale reconstruction m ethod based on the 
ARGO code (first reference in iKitaura & E nBlin 2008). We start with the 
high resolved (small scales) flux Fj(z) along the los spectra {j} from 
which we reconstruct the ID matter overdensity 5^j 2 (z) with angular and 
redshift coordinates z including redshift distortions indicated by the super- 
script z. Assuming linear theory we transform the density into a comov- 
ing frame r obtaining <5^ z (r). The set of reconstructed density lines are 
sorted in a lower resolved grid (with grid positions tq) than the binning 
of the spectra (large scales) leading to an incomplete matter overdensity 
field in redshift-space S^ B ' Z (tq). We use the Bayesian framework based 
on the Gibbs-sampling approach to jointly sample the matter density field 
SmI^g) in real-space with the Hamiltonian sampling scheme, the power- 
spectrum with the inverse Gamma distribution function and correcting for 
redshift distortions with Lagrangian perturbation theory (see §433) . From 
the power-spectra the BAO signal (/baoW) can De measured. The as- 
terisks indicate steps in which a set of cosmological parameters has to be 
assumed. 



at Fig .[7J. Here a Poisson/Gamma-lognormal model jKitaura et al.l 
I2OIO1) is adopted (a Ga ussian-lognormal m odel could also be ad- 
equate, see 5 lC031 and IPichon et al.ll200lh . The gaps in-between 
the lines are randomly augmented by sampling the full posterior 
of this model with the Hamiltonian Markov Chain Monte Carlo 
technique, thus solving for mask indu ced mode-coupling in the 
power- spectrum (IJasche & Kitaural2O10f) . We show how to sample 
the power-spectrum corresponding to the Gaussian distributed vari- 
able associated to the lognormal assumption and how to correct for 
linear and quasi-nonlinear redshi ft distortions within the Bayesian 
framework following the idea of Kitaura & EnBli^ d2008l) . Details 
of both ID and 3D reconstruction methods are presented in the next 
sections. 



3 ID RECONSTRUCTION: SMALL SCALES 

The purpose of this section is to make an estimation of the uncer- 
tainties in the recovery of the overdensity field 8m(z) (Am (z) = 
1 + 5m{z) with the subscript M standing either for the baryonic: B 
or the dark matter: D) along the los from quasar absorption spectra. 

The usual approach in signal reconstruction is to define a data 
model, i. e. the equation relating the observational data (in our case: 
the flux) to the desired underlying signal (in our case: the dark mat- 
ter density field), and then invert this relation. In this approach, a 
good understanding of the d ata model will lead t o a bet ter estimate 
of the signal. This is the wav lNusser & Haeh nelt ( 1999) propose to 
recover the density field from quasar absorption spectra. 



© 0000 RAS, MNRAS 000, 000-000 



4 Kitaura et al. 




4750 4800 4850 4900 4950 

A obs [A] 

Figur e 2. Top-most panel: e xample of synthetic flux quasar absorption spectrum F as a function of the observed wave-length in Angstroms A t, s [A] , simulated 
as in lGallerani et alj i20 1 Top-middle pane l: the original density field is shown through a black line, while the red line denotes the recovered density field 
through our method (see lGallerani et alj201 lh . The cyan shaded region represents the 15% relative error on the density field reconstruction in all the panels; 
Bottom-middle panel: the original density field smoothed to a scale of 5 comoving Mpc is shown through a black line, while the red line denotes the recovered 
density field smoothed to the same scale; Bottom-most panel; same as in the bottom-middle panel but adopting a smoothing scale of 10 comoving Mpc. 



However, we propose to adopt a statistical model as described 
below (see also Gallerani et alj201lh . 



3.1 Statistical data model 



As it was already noticed bv lNusser & Haehnelj fl999h the thermal 
history and the ionization level o f the IGM can be constrai ned if 
one knows the matter statistics. In lNusser"& Haehnelt(1999) work 
this is done as a consistency check once the dark matter field has 
already been inverted. Then the matter statistics extracted from the 
recovered density field is compared to the theoretical model. 

One can instead u se the flux PDF, P(F ), and directly relate it 
to the matter statistics dGallerani et al j[20lTh . Assuming that there 
is a one-to-one relation between the observational data (the flux) 
and the signal (the matter field) through their probability densities 
leads to a statistical data model: 



dFP(F) = 



dA M P(A„ 



(1) 



with F t and AJ^ being chosen in such a way that both integrals give 
the same area. This data model assigns to each flux F* a unique 
overdensity Ajyr. Such a relation is however not unique due to ther- 
mal broadening, nonlocal biasing and peculiar motions. The first 
two effects ca n be shown to cancel out averaged over large scales 
(see § 13,31 and Galler ani et al.ll20lH) . Note that the peculiar veloc- 
itie s along the los can be estim ated in an iterative fashion follow- 
ing jNus^ex&^aehnelj d 1999h . The disadvantage of this approach 
is that the Doppler parameter needs to be constrained in the Voigt 



profile to perform the deconvolution from redshift-space to real- 
space. We will therefore take care of the peculiar motions in the 3D 
reconstruction (see 

A model for nonlocal biasing becomes unnecessary for our 
large-scale structure analysis regarding the scales of interest (see 
discussion in § 13. 2b . Additional complications to biasing could arise 
in the presence of an inhomogeneous UV-background. Such an ef- 
fect would become hard to quantify, however, we expect this contri- 
butio n to be small at the redshifts of interest (see Maselli & Ferraral 
12005b . 

Due to noise in the measurements there is a minimum Fmi n 
and a maximum f mlx detectable transmitted flux which are related 
to characteristic overdensities A M and A M , respectively. This puts 
constraints on the integral limits in Eq. Q}: F m i n < F* < _F m ax 
and hence Am < A* < A M . The minimum and maximum de- 
tectable transmitted flux can be calculated from the observed noise 
root mean square deviation: F m in ~ <Jn\ Knax = 1 — <7 n . Knowing 
-F m in and _F max we can determine A M and A M using the following 
analogous relation: 



/' 

Jo 



dFP(F) 



dFP(F) 



Ja* 



dA M P(A M ). 



dA M P(Ai 



(2) 



(3) 



The problem of underestimating (overestimating) the density 
in regions with Am ^ A M (Am ^ A M ) can be easily solved 
in our approach by randomly augmenting the density according to 
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the matter statistics lGalleranietaT]|201ll) . This could be impor- 
tant in redshift ranges with large saturated regions estimating the 
power-spectrum. Note however, that in the redshift range z —2-3 
we do not run into such problems as the saturated regions are rare 
( <, 10%). 

The inference of Afj — Aq — A^ only depends on the assumed 
PDF of the density field and not on any assumption concerning the 
IGM thermal and ionization history (see Eqs. [T] |2] [3j> which are 
encoded in the observed PDF of the transmitted flux. 

Summarizing, the method we intend to adopt to reconstruct 
the density field along the los towards a quasar is the following: at 
each bin with F, sC F m i n we assign an overdensity Am, at each 
bin with F* ^ F max we assign an overdensity Am, while at each 
flux F m j n < F t < F max we associate an overdensity A M satisfy- 
ing Eq. l[T}. The uncertainties in the reconstruction can be obtained 
by propagating the uncertainties in the measured flux PDF and the 
assumed matter PDF with Eq. Q}. 

The only parameters required in the statistical data model are 
those describing the matter PDF; the rest of parameters are con- 
strained by observations. In the lognormal approximation the mat- 
ter statist ics is fully determine d by the corresponding correlation 
function dColes & Joneslll99ll) . The dark matter correlation func- 
tion is determined by the cosmological parameters. If one wants to 
recover the baryon density field one has to include a bias to trans- 
form the dark matter correlation function into the baryonic cor- 
relation function. As we have discussed above the lognormal ap- 
proximation fails on these small scales. If one perturbs the lognor- 
mal PDF with the Edgeworth expansion, one can model the skew- 
ness and kurtosis of the matter PDF with the three point and the 
four p oint statistics w hich for the univariate case are simply num- 
bers dColombilll994l) . This introduces two additional parameters. 
Miralda-Escude et al. ( 2000) present a fitting formula based on nu- 
merical N-body simulations which requires four parameters and ac- 
counts for the baryonic matter density distribution. Note, however, 
that implicitly more parameters flow in through the numerical cal- 
culations. In particular the initial conditions will be determined by 
the cosmological parameters. 

3.2 Bias relation between baryonic and dark matter 

Here we describe the biasing model we are considering in the 
ID numerical experiments. As we have discussed in the section 
above, we can avoid having to formulate explicit biasing relations 
by choosing the matter statistics corresponding to the species we 
want to recover. If we set the dark matter statistics in Eq. (TJ we 
will get an estimate for the dark matter density field. 

Assuming that the low-column density Lya forest is produced 
by smooth fluctuations in the intergalactic medium which arise as 
a result of gravitational instability we can relate the overdensity 
of the IGM to the da rk matter overdensity through a convolution 
(Bi & Davidsenll 19970 . In Fourier space representation we have: 



B(k,z)5 D (k, 



(4) 



where 5n(k, z) is the Fourier transformed dark matter linear over- 
density wit h the hats denot ing the Fou rier transformation (see dis- 
cussion in lViel et alj |2002). Following iBi & Davidsenl d 19971) one 
approximates the bias by: B(k,z) = (1 + fc 2 /fcj) -1 which de- 
pends on the comoving Jeans' length 

2jk B T (z) 



kj\z) = H 



■nl/2 



(5) 



0, 2 = 




Figure 3. Probability distribution function of the recovered density field 
relative error (e v = (Ab — Ab)/Ab with Ab being the reconstructed 
density field) obtaine d by applying the met hod of density field reconstruc- 
tion implemented bv lGallerani et alj fepllb to 10 synthetic quasar absorp- 
tion spectra by neglecting the peculiar velocity corrections (i. e. in redshift- 
space). The solid line represents the full resolution case, the dotted line 
shows the results for density fields smoothed to a scale of 5 comoving Mpc, 
the dashed line in the case of density fields smoothed to a scale of 10 co- 
moving Mpc. 



density, /i the IGM molecular weight, Qom the present-day matter 
density parameter and 7 the ratio of specific heats. The squared bias 
gives us an estimate of the power- spectrum of the IGM -Pb(&, z) 
related to the dark matter power- spectrum Pu(k, z): 



P B (k,z) = B 2 {k,z)P D (k y z). 



(6) 



_3^m p O,om(l + z) 
with fcg being the Boltzmann constant, To the temperature at mean 



Other more complex biasing relations have been proposed in the lit- 
eratur e dGnedin & Huill998l : lNussej2000l : lMatarrese & Mohavaed 
l2002h . 

It was shown bv lGnedin & Huil d 1998b (see Fig. 1 in that pub- 
lication) that biasing affects only the very small scales: k J> 2 h 
Mpc -1 . We can thus avoid biasing in the large-scale structure anal- 
ysis by choosing a box with modes smaller than k = 1 h Mpc - . 
Our synthetic 3D reconstructions include modes up to k ~ 0.6 
h Mpc -1 (see § |4.2t . This restriction does not affect the study of 
BAOs. Therefore, from now on we will make no distinction be- 
tween the baryon and dark matter overdensity field and call it sim- 
ply the matter field: 5m = 5b = 5n. Let us nevertheless study the 
impact of the physics in the density reconstruction on small scales 
in more detail below. 



3.3 ID numerical experiments 

In this section we study the impact of saturated regions and ther- 
mal broadening together with peculiar velocities to the analysis of 
the cosmological la rge-scale structure. In order to do so, we follow 
the prescription bv lGallerani et al.l J2006l) a lso adopted in the de - 
velopment of our ID reconstruction method dGalleranietal.l201lh . 
Note that the spatial distribution of the baryonic density field and 
its correlation with the peculiar velocit y field are taken into ac count 
adopting the formalism introduced by IBi & Davidsenl dl997t) . We 
generate with these assumptions a synthetic flux quasar absorption 
spectrum (see upper-most panel in Fig.O an d recover the underly- 
ing baryonic density field with our method dGallerani et al.ll201lh 
(see top-middle panel in the same Fig.). We then smooth the recon- 
struction to Ls = 5 and Ls = 10 Mpc scales (see bottom-middle 
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and bottom-most panels in Fig. [2}- In Fig. lO the relative error of 
the reconstructions is represented showing that the errors are small 
and become symmetric distributed at large scales. This exercise 
shows us that, on small scales (< 1 Mpc), once peculiar motions are 
included, the recovered density field is slightly shifted with respect 
to the original one. At large scales, one important point to notice 
is that the maximum and the minimum overdensities get below and 
above the saturation threshold (—1.0 <. log 10 Ab <J 0.5). 

We can there fore conclude from t his study together with the 
ones performed in lGallerani et alj d20 1 lh that saturation and ther- 
mal broadening will not affect the large-scale structure analysis. 
The main contribution to the uncertainties in the estimation of the 
density along the los with our method come from the peculiar mo- 
tions. We will show how to correct for redshift distortions in the 3D 
reconstru ction procedure within t he Bayesian framework as pro- 
posed bv lKitaura&Enfilinl (2008). Moreover we will demonstrate 
how to jointly sample the power-spectrum and the 3D matter field. 
Here we are neglecting errors in the determination of the continuum 
flux. However, the propagation of the error in this quantity should 
be investigated in future works. 



4 3D RECONSTRUCTION: LARGE SCALES 

A set of multiple los quasar absorption spectra leads to a sparse 
and incomplete distribution of the dark matter density field. In this 
section we present the Bayesian reconstruction method to perform 
an analysis of the large-scale structure based on such data follow - 
ing the works bvlKitaura & Enfilinl J2008I): iKitauraet alj J2009h ; 
Ijasche et al.l ( l2010l) : lKitaura et al.U2010)) ; |jasche & Kitaurd 12010) . 
In this work we further extend the aforementioned techniques to 
perform joint reconstructions of matter fields and power-spectra in- 
cluding redshift distortions correction. In this approach a model for 
the signal (here: the 3D matter field) is defined through the prior 
distribution function and a model for the data (here: recovered ID 
density los) is defined through the likelihood. 

The density estimates along each quasar sight line lead to the 
gridded density field in redshift-space on a 3D mesh which we de- 
note by 5^' z with cell i and a total number of iV c cells. There 
are two sources of uncertainty associated to this quantity: the first 
comes from the uncertainty in the ID density reconstruction from 
the quasar spectra; the second comes from the completeness in each 
cell. According to our findings the errors in the matter reconstruc- 
tion along the los using our method should be dominated by pecu- 
liar motions (neglecting errors in the determination of the contin- 
uum flux) after the gridding step (see §[3j- Let us write a data model 



for the degraded overdensity field in redshift-space 5^ B ' Z 

cobs, 2 n x z i 



(7) 



where R represents the response operator and e z the noise. We will 
make the simplifying assumption that the response operator is a 
diagonal matrix given by the 3D completeness (or selection func- 
tion) Rij — WiSfj (where Sfj is the Kroenecker delta and Wi is the 
completeness at cell i). The completeness represents the accuracy 
with which each cell has been sampled by the quasar sight spec- 
tra. The likelihood should therefore model the uncertainty in the 
density field in each cell as a function of the completeness in that 
cell. 

In the numerical experiments we will use the Poisso- 
nian/Gamma likelihood as we assume that the ID reconstruction 
has negligible errors and the uncertainty is only due to the incom- 
pleteness and peculiar motions. Note that this work could be ex- 



tended within the same formalism to incorporate more complex 
correlations in the noise covariance u sing a Gaussian likelihood 
(see appendix ICland lPichon et al]|200ll) . 

We will show how to treat peculiar motions separately with a 
Gibbs-sampling scheme. Therefore we will first focus on the mat- 
ter field reconstruction assuming that the data are transformed into 
real-space S^ B . Let us define the statistical model for the matter 
field in real-space. 

4.1 Prior for the matter statistics on large scales in real-space 

We assume a multivariate lognormal distribution for the matter den- 
sity field. The logarithm of this prior reads: 



\n(P(s\c)) 
1 



(8) 



-In 



((271-)^ det(S)) 



+ i S t S- 1 s, 



with s = ln(l + Sm) — H s , /x s = (ln(l + <5m)) and S being the 
covariance of the field s. The corresponding gradient yields: 

91n(P( S |c)) _ . 



t)s 



(9) 



with c being the set of cosmological parameters which determine 
the covariance S. To see how the covariance S is related to the 
Gaussian po wer-spectrum P(k) an d how to calculate the mean 
field fi s see Coles & Jones! dl99 ll) (detailed derivations can be 
found in lKitaural2O10l) . 

We should note here that the lognormal prior is a Gaus- 
sian with zero mean for the variable s = ln(l + Sm) ~ 
fi s . In several works this quantity has been interpreted as 
an e stimate of the linear component of t he density field (see 



Vieletal. 2002; Gallerani et al. 2006; Neyrinck et al. 2009; 



IKitaura & Anguld l201l1) . This motivates us to sample the power- 
spectrum corresponding to the variable s instead of the power- 
spectrum of the nonlinear matter field Sm as we will show below. 
When generat ing random lognorm al fields with a given power- 



When generating random lognormal tields with a given power- 
spectrum (see e.g. IPercival et al JI2004 ) or in the case of lognor- 
mal density reconstructions (see e.g. lKitaura et al.|[2010h one does 
not know the full nonlinear density field so that the mean field 
fi 3 cannot be directly computed from <5m ( fi s = (ln(l + Sm))) 
but it is calculated f rom the lognormal correlation function (see 
IColes&Jonelll99ll) . However, as we want to sample over the 
power-spectrum we should not keep any dependence on the theo- 
retical model of the power-spectrum. Actually one can demonstrate 
that the mean field fi s is fully determined by the Gaussian variable 
s by imposing fair samples of the density field ({Sm) = 0) 

^ = -ln((exp(s))) , (10) 

(for a demonstration see appendix lAl. 



4.1.1 Joint matter field reconstruction, power-spectrum 
sampling and redshift distortions correction 

Here we propose a simple and straightforward approach to jointly 
sample power- spectra, non-Gaussian density fields and peculiar 
motions. The idea consists on using the Gaussian prior for the 
matter field and encoding the transformation of the non-Gaussian 
density field into its linear-Gaussian component in the likelihood. 
The advantage of this approach is that it permits us to model 
the PDF of the power-spectrum with the inverse Gamma distribu- 
tion in a consistent way under the Gaussian prior assumption (see 
IKitaura & Enfilinl 120081 : [jasche et al.ll2010l) . We now rely on the 
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Figure 4. Left panel: ratio between the power-spectra in redshift (P z (k)) and in real-space (P(fc)) adopting the formalism described in appendix B. Right 
panel: successive power-spectra showing the redshift distortions correction including the linear power-spectrum (green), the power-spectrum in redshift-space 
(red) and the power-spectrum in real-space (violet). 



lognormal transformation to relate the nonlinear density field and 
its linear cou nter-part as it has been done in othe r works to model 
the IGM (see lViel et ai]|2002klGallerani et alj2006h. Also note, the 
recent work s on the lognormal transfo rmation bv iNevrinck et al.l 
d2009Ll201 ll) : lKitaura & Angulol ilOl ll) . The validity of this formu- 
lation does not depend on how accurately the lognormal approxi- 
mates the linear Gaussian field. As far as this work is concerned, we 
need to test whether the resulting sampled power- spectrum will re- 
semble the actual linear one. We will restrict ourselves in this work 
to the large-scale structure at redshift z — 3. Future work should 
be done to extend this study to other redshifts. 

The PDF P(s, v, S|d z ) considered here is given by the joint 
PDF of the signal s, the peculiar velocity field v and the power- 
spectrum (or its real-space counter part: the correlation function 
S). Notice that we use the following notation: the data vector in 
redshift- and real-space is given by d z = {f>Mi' 2 } and d = {<5m,1} 
respectively. Knowing how to sample the conditional probability 
distributions of each variable 



s U+1) P(s I v u) ,S, 

Sti+D ^ P(S|s 0+1) ) 
O+i) ^ p( v j s 0+i)) 



v 



(11) 
(12) 
(13) 



permits one to sample the jo int probability distribution using the 
Gibbs-sampl ing scheme (see iGernan & Gemarj [l984; Neali ll993l ; 
iTanneil 19961) . Note that the arrows indicate sampling from the con- 
ditional PDF. 

Let us describe below how to sample each conditional proba- 
bility distribution function 

(i) Matter field reconstruction (Eq.Qj} 

This step is done with the Hamiltonian sampling method under 
the Gaussian prior assumption for the variable s and encoding the 



J As we already discussed in [|3]one could alternatively correct for redshift 
distortions in the density along the los. Consequently one would have the 
gridded data vector for the 3D inference analysis directly defined in real- 
space. 



lognormal transformation in the likelihood (particular expressions 
and details to the Hamiltoni an sampling technique can be found in 
appen dix Icl and |Pl also see lKitaura et aljfeoid ; Ijasche & Kitaural 
l20ld) . 

(ii) Power-spectrum sam pling (Eq. [T2 l 

We follow the works of Jewell et al. ll2004l): I Wandelt et al. 



d2004l) 



Jasche et al. 



Eriksen et alj J2004I) : iKitaura & Enfilinl d2008l) ; 
1 20ld) and sample the power-spectrum corre- 
sponding to s with a likelihood given by the inverse Gamma 
distribution function and Je ffreys prior. For technical details in the 
implementation we refer to lJasche et al.l J20101) . 
(iii) Redshift distortions correction (Eq.fLTt 
Here we further develop the idea of incorporating the pe- 
culiar velocity estima tion in a Gibbs-sampli ng scheme (see 
IKitaura & Enfilinl2008l) . We follow the works of l Raised dl987h and 
in particular the ex pressions derived in l inear Lagrangia n perturba- 
tion theory (LPT) jzel'dovich|[T970h by |McGillH1990iJ) who stud- 
ied the formatio n of caustics in the Lyman alpha forest within this 
framework (see lMcGil3ll990bl) . Note, that higher order LPT may 
be applied following the works of Monaco & Efstathioul ( fl99^) . We 
should mention here that our problem (a set of quasar absorption 
spectra in redshift-space) requires a treatment which corrects for 
redshift distortions directly of the overdensity field, instead of the 
particle positions as in the case of galaxy redshift surveys (for the 
latter case see the works of Hivon et al. 1995; Nusser & Br anching 
l200d: iBranchini et al.ll20o3 ; lLavaux et alj|2008h - We show in ap- 
pendix [B] that the observed overdensity field in redshift-space 
Sm S ' Z as obtained after gridding the reconstructed overdensity 
fields along quasar sight lines is related to the same field in real- 
space S^ B through the following expression 



robs I cobs 
°M,i + °v,i , 



(14) 



where one can define an observed peculiar overdensity field 5° hs 

(15) 



robs r > 

v i — WiO v i -y- (L v • 



which suffers from noise e Vti and selection function effects Wi. The 
particular expression for the undegraded peculiar velocity term is 
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Figure 5. Slices of ~10 h~ Mpc width averaged over 10 neighbouring slices through the center of the box with 1.34 h^ 1 Gpc side length (Y = 0) showing 
upper left panel: the density field in real-space 5m, upper middle panel: the velocity term S v multiplied by a factor of 2 for visualization purposes, the 
upper right panel: the matter field in redshift-space 5 M , lower left panel: the completeness according to the distribution of about 130000 mock quasar spectra 
multiplied by a factor of 3 for visualization purposes (ranging from to ~ 0.33), lower middle panel: 10th sample of our reconstruction, lower right panel: 
sample 400. 



given by 

8 V = -(Hay 1 ^ ■ (v-r)r , (16) 

where H is the Hubble constant, a is the scale factor and f is the los 
direction vector. We consider linear theory to estimate the peculiar 
velocity field <5 M oc — V • v. 

In summary, the data vector is transformed from redshift- to real- 
space for each Gibbs-sampling iteration in step (iii) computing 

cobs robs, 2 JC°b s /1"7\ 

9 M,. - — °v,i ■ U'J 

This relation is used for the next iteration in step (i) to compute 
the matter field reconstruction and subsequently obtain the power- 
spectrum sample in real-space (step ii). Note that in the first iter- 
ation one must assume that the real- and redshift-spaces are equal 

cobs cobs, 2 

°M,i — °M,i ■ 

4.2 3D numerical experiments 

In this section we pres ent numerical tests with the large N-body 
simulation provided bv lAnguloetal] [2008) of 1.34 h' 1 Gpc side 



length at redshift z — 3 showing how to solve for incompleteness 
in a set of multiple los density tracers and for redshift distortions 
to recover the large-scale structure, the power-spectrum and ulti- 
mately the BAO signal. 



4. 2. 1 Redshift distortions 

Here we validate the formalism presented in section H4. 1.1 l and ap- 
pendix [B] In particular we check Eq.[l4]by comparing the power- 
spectra of the resulting overdensity field in redshift-space to the 
actual one from the N-body simulation. We find that convolv- 
ing the density field with a Gaussian kernel of smoothing length 
rs = 1.25/i _1 Mpc to estimate the velocity term gives an excellent 
fit to the actual ratio from the N-body simulation being close to the 
constant Kaiser factor (~ 1.8) at large scales ( see left panel in Fig.|4] 
in this work and Fig. 7 in lAngulo et alj2008l) . 

In the upper panels of Fig. [5] we can see slices of the N-body 
simulation in real-space 8m (left panel) with the corresponding 
velocity distorted component S v (middle panel) and the resulting 
overdensity field in redshift-space S M (right panel). As the observer 
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Figure 6. Cell-to-cell correlation between the real density field 5m and the one in redshift-space (left panel), in redshift-space including mask and noise 
S Z / w ( me division by the mask w is meant to give an estimate of the density field) (middle panel), and an arbitrary reconstructed sample 5JJj c taking as 
input data <5^ s ' z (note that different converged samples give extremely similar results). The different colours indicate the number density of cells for each 
overdensity bin. Low density values are represented by light and high values by dark colours. 



is located to the right of the box (at z = 0) one can see the elon- 
gated structures along the Z-axis. 



4.2.2 3D mask and mock density tracers 

The completeness of a quasar spectra distribution can be directly 
extracted from the data. Only cells in which spectral lines have 
been detected have nonzero completeness to our purposes, other- 
wise the cell has not been traced by the Lya forest. Either because 
there are no quasars in the background or that region of the sky 
was not observed, with the reason being irrelevant. The complete- 
ness in observed cells can be calculated from the number density of 
spectra crossing that cell. Therefore to estimate the 3D mask (com- 
pleteness) we need to simulate a distribution of quasar spectra. 

To know an approximate number of quasars in a 1.34 h~ x Gpc 
comoving volume at redshift z =2-3 one needs to first calculate the 
number of dark matter halos with masses greater t han ~ 10 12 solar 
masses. Looking at Fig. 1 in iMo & White] d2002h one finds about 
10 7 objects. In order to know which fraction of halos in that mass 
range will host a quasar one has to consider the duty cycle. Assum- 
ing a duty cycle of 10 -3 — 10 -2 will leave us with about 10 4 -10 5 
quasars. Following previous works on forecasts for the BOSS sur- 
vey we assume that we have about 130000 q uasar spectra in th e 
volume of 1.34 h~ L Gpc side length (see e.g. ISlosar et al.] 2009). 
The mock quasars are uniformly distributed with right ascension 
(a) and declination (8) in the following ranges: 105° < a < 270° 
and -5° < 5 < 70° and redshift 1.8 < z < 3.5 emulating the 
Sloan survey. Then we ascribe to each mock quasar a Lya forest 
with a maximum length of z = 0.5 and assume a resolution in the 
spectra of about 20-40 h^ 1 kpc (roughly 100-1000 bins with inter- 
vals of ~ 10-20 h^ 1 Mpc comoving length) towards the observer. 
The high resolution in the spectra permits us to neglect aliasing ef- 
fects. All the redshift bins which are closer than z — 1.8 to the 
observer are discarded. We then transform each bin of the set of 
mock Lya spectra to comoving coordinates and grid them in a box 
of 1.34 h" 1 Gpc and 128 3 cells. The relative density of spectra 
determines the completeness in each cell. 

The density field is generated by gridding the dark matter par- 
ticles on a 256 3 mesh, then deconvolving with the mass-assignment 



kernel and finally applying a low-pass filter t o map the field in 
Fourier-space to a l ower resolution of 128 3 (see ljasche et alj2009l : 
iKitaura et ai]|2009l , and references there-in). Note that this proce- 
dure enables us to get nearly alias-free 3D fields. We should note, 
that techniques to solve for aliasing effects on the power- spectrum 
(see e. g. Jing;2005) cannot be considered here as we perform a 3D 
analysis which includes the phase information. 

Finally, we generate a Poissonian/Gamma (see appendix [Ct 
subsample with the previously calculated completeness from the 
density field in redshift-space. This is equivalent to assume that the 
density field in redshift-space corresponding to each mock quasar 
is nearly perfectly recovered on a resolution of 10-20 h~ Mpc and 
the noise in each cell is dominated by the completeness. 



4.2.3 Reconstructions 

We start running only the Hamiltonian sampling scheme together 
with the velocity sampling, as we find that additional power- 
spectrum sampling leads to extreme unphysical peculiar velocities 
in the burn-in period which slow down the Markov Chain or even 
make it crash rapidly increasing the power on large scales. Once the 
the chain is nearly converged we start sampling the power- spectrum 
as well. This happens only after a few hundred samples as we can 
see from the right panel in Fig. [7] We find that the errors in the 
determination of the peculiar velocity field, i. e. e v ,i = can be 
neglected. Note however, that we have here full control over the 
model as we have tuned it with the N-body simulation. In real-data 
applications we should include errors which will of course increase 
the variance in the density fields and power-spectra. 

We also found that rarely (< 1% of the samples) some density 
samples still lead to unphysical peculiar velocities after the burn- 
in period. We note here that we are using Jeffreys non-informative 
prior for the power-spectrum and more constrained p riors may be 
used (see IKitaura & EnBlinll2008l ; Ijasche et alj [2010) which could 
avoid this problem. We have introduced here a rejection step for the 
samples which have peculiar velocity terms with 5 V > 0.7 as the 
typical values are below 0.6. 

The Gibbs-sampling procedure is run for about 10000 itera- 
tions leading to an ensemble of density fields, power- spectra, and 
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peculiar velocities. In the lower panels of Fig.|5]we can see slices of 
the mask (left), the 10th density sample (middle) and the 400th den- 
sity sample (right). We can appreciate in the middle plot how the 
regions with higher completeness (red-yellow regions in the colour 
code in the left panel) are especially elongated along the Z-axis 
showing that there is more information about the field in redshift- 
space which has to be transformed into real-space. The region on 
the right of the box is smoother as there is less information there. 
The sample on the right of the figure shows well balanced structures 
and no noticeable transition from regions with higher completeness 
to lower ones demonstrating that the Hamiltonian sampling tech- 
nique correctly augments the field according to the model and the 
data. 

To see how the whole Gibbs-sampling scheme is working in 
a more quantitative way we show in Fig. [6] cell-to-cell correla- 
tions between the overdensity field in real-space from the N-body 
simulation against the field in redshift-space (left panel), the input 
data (middle panel) and one reconstructed sample (right panel) af- 
ter convolution with a Gaussian kernel of smoothing length rs = 
10 h~ Mpc. The left panel shows in another way the same effect 
as we see in Fig. [4] namely the excess of power on large scales in- 
troduced by the linear redshift distortions which dominate at these 
relatively high redshifts (z ~ 3). It is notable how the range for the 
overdensity field in redshift-space 8yi is larger than the one for the 
field in real-space 5m ■ The middle plot shows the additional effect 
of unsampled regions and noise by having an additional elonga- 
tion around the mean density (S^'" ~ 0) and a larger dispersion. 
Note that we use the field divided by the completeness which is the 
flat prior estimate f or the density field under selection effects (see 
iKitaura et al]|2009l) . The panel on the right demonstrates that the 
density samples yield unbiased estimates of the field on scales of 
about 10 h^ 1 Mpc. Notice that the field <5JJ C is the nonlinear esti- 
mate of the overdensity field computed from our samples by using 
the lognormal transformation 5JJj c = exp(s + fj, a ) — 1. 

We finally show in Fig.[7]fhe ensemble of nearly 10000 power- 
spectrum samples summarized by the mean power-spectrum (black 
curve) together with the 1 sigma and 2 sigma countours. For com- 
parison we also show the power-spectrum in redshift-space (green 



curve) and in real-space (red curve) of the N-body simulation. Note, 
that the reconstructed samples are considerably closer to the N- 
body simulation in real-space than in redshift-space in terms of the 
power-spectra. However, we can also notice from this figure that the 
recovered power-spectra are closer to linear theory (dashed curve) 
than the red curve as we expect it from the lognormal lineariza- 
tion discussed in 34.1.11 The mean of all samples is normalized 
by the assumed fiducial power- spectrum to visualize the BAO wig- 
gles on the right panel. Note that cosmic variance is considered in 
a Bayesian way by sampling the inverse G amma distribution (see 
Kitaura & EnBlin 2008; Jasche et al. 2010). The excess of power 
with respect to linear theory at small scales (fc J> 0.2 hMpc~ ) can 
be due to various reasons: the lognormal approximation does not 
give perfect linearized fields, the aliasing of the gridding scheme 
leads to complex correlations between neighbouring cells which 
are not handled, or the peculiar velocity fields are not properly cor- 
rected on these scales. Also note, that we are neglecting strong cor- 
relations of the noise along quasar sight lines inside the cells which 
may increase the error bars, especially on small scales. We leave 
such an investigation for later work. Note that the deviation from 
linear theory starts at scales in which the BAO signal is washed out. 
Our tests demonstrate the validity of the method to sample density 
fields, power-spectra and peculiar motions on large scales (> 20 
h^ 1 Mpc) from sparse noisy data in the quasi-nonlinear regime 
and its use for BAO measurements. 



5 SUMMARY AND CONCLUSIONS 

We have presented in this work a novel method for the joint recon- 
struction of cosmological matter fields, power-spectra and peculiar 
velocity fields in the quasi-nonlinear regime. 

We have applied this method to large-scale structure analysis 
from the Lya forest based on multiple quasar absorption spectra. 
For this kind of studies we have proposed to split the reconstruction 
problem into two steps. The complex physical relation between the 
flux of quasar absorption spectra and the ID dark matter field along 
the lines-of-sight is solved in the first step. This permits us to apply 
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simple statistical models in the second step to analyze the 3D large- 
scale structure. 

Our method assumes the adequate matter statistics in each 
scale, being completely flexible in the choice of a non-Gaussian 
matter PDF in the first step and using the multivariate lognormal 
model for the large-scales in the second step. 

In the first step, we have based the ID reconstruction of the 
matter density fields through the Lya forest on a method recently 
introduced by iGallerani et alj d201lh which has the advantage of 
being free of any assumption on the thermal history and the ioniza- 
tion level of the IGM. In this work we have shown that saturation 
and thermal broadening will not affect our large-scale structure re- 
construction assuming that the errors in the determination of the 
continuum flux are under control. The main contribution to the un- 
certainties in the estimation of the density along the line-of-sight 
quasar spectra depends on the peculiar motions and the complex 
masking of a quasar absorption spectra distribution. 

In the second step we propose to apply the Bayesian frame- 
work with the Gibbs-sampling approach to jointly sample non- 
Gaussian density fields, power-spectra and peculiar motions. The 
idea consists on using the Gaussian prior for the matter field and 
encoding the transformation of the non-Gaussian density field into 
its linear-Gaussian component in the likelihood. The advantage of 
this approach is that it permits us to model the PDF of the power- 
spectrum with the inverse Gamma distribution in a consistent way 
under the Gaussian prior assumption. We rely on the lognormal 
transformation to relate the nonlinear density field and its linear 
counter-part. Redshift distortions are corrected by sampling the pe- 
culiar velocity field with linear Lagrangian perturbation theory. 

Finally we have validated our method with a large N-body 
simulation. We found that from strongly biased input data due to 
redshift distortions and masking our method recovers unbiased den- 
sity samples on scales of about 10 /i -1 Mpc. The reconstructed 
power-spectra of the Gaussian variable in the lognormal model turn 
out to be closer to linear theory as we already expected from previ- 
ous works. We have restricted ourselves to redshift z = 3. Future 
work should be done to extend this study to other redshifts and in- 
vestigate its use for galaxy redshift surveys. 

This work should contribute towards an optimal cosmological 
large-scale structure analysis in the quasi-nonlinear regime. 
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APPENDIX A: CALCULATION OF THE MEAN 
LOGNORMAL FIELD FROM THE LINEAR DENSITY 
FIELD 

The lognormal field (see IColes& Jone"slll99ll) gives an estimate 
of the linear-Gaussian density field (see e.g. I Viel et all 1 20021 ; 
iGallerani et aDl2006l : iNevrinck et alll2009h . 



s = ln(l + dm) - M s 



(Al) 



The mean field fi s is the ensemble average of ln(l + 8m) by defi- 
nition: 



(a) = (ln(l + * M ))-/J. = 0. 



(A2) 



We assume here that the fields are fair samples, i.e. (Sm) = 
and (a) = 0. One can find that the field fi s can be expressed as a 
function of the linear field s only: 

S M = exp(s + fi s ) — 1 

((5 M ) = (exp(s + mJ) - 1 = 
1 



expOJ = 



(exp(s)) 
-ln«exp( S )». 



In practice the mean field is calculated in the following way 



111 



Nc 



(A4) 



Note that to ensure that the field s has zero mean, we have to im- 
pose that the zeroth mode of its power-spectrum vanishes. 



APPENDIX B: RELATION BETWEEN THE DENSITY 
FIELD IN REAL- AND REDSHIFT-SPACE IN LINEAR 
LAGRANGIAN PERTURBATION THEORY 

To correct f or the redshift distortions in the linear regime we follow 
the work o f lKaiserlfl987h and in particular the expressi ons derived 
in lin ear Lagrangian pe rturbation theory (LPT) (see IZel'dovichl 
1 1970}) bv lMcGilll?1990al) . 

We can write in Lagrangian perturbation theory the Eulerian 
coordinates x as the sum of the Lagrangian coordinates q and a 
displacement field \E>: 

x = q + <&. (Bl) 

The analogous expression in redshift-space can be written as 

x z = q + * z , (B2) 

with the displacement field in redshift-space given by 

* 2 =* + /(*■ r)r , (B3) 

where / = dlnD/dlna, H is the Hubble constant, a is the 
scale factor and f is the line-of-sight direction vector. For flat 
models with a non- zero cosmological constant / ~ f2 5 / 9 (see 
Bouch et et al. I ll995l) , where Q(z) is the matter density at a red- 
shift z. Note that to linear order the velocity is proportional to the 
displacement field 

v = fHa * . (B4) 
Imposing conservation of mass one finds 

<5m = J" 1 - 1 ^ -V ■ * ~ -(fHa)' 1 ^ ■ v, (B5) 

and 

Su = J7 1 - 1 m -V ■ * 2 ~ <5 M - /V ■ (* ■ r)r , (B6) 

with J = det(dxfdq) and J z = det(dx z / dq) being the Jaco- 
bians of the corresponding coordinate transformations in real- and 
redshift-space expanded to linear order. We can then write the re- 
lation between the overdensity field in real- and in redshift-space 

as 

8 M = &m + 5v , (B7) 

with 

Sv = -/V • (* • r)r = -(//a) _1 V • (v ■ r)r . (B8) 

If we consider observational effects like a mask w and some noise 
e then we have in real-space 

Sm s = wS M + e , (B9) 
and correspondingly in redshift-space 

(B10) 



cobs. 2 CZ . 

d M = w z d M + e z 



Assuming that the mask is approxi mately the same in b oth spaces 
w = w z (for a discussion on this see Kit aura et alj2009l) we find 



(A3) 



Sm B ' Z = ™<5m + e + wS v + e v 



(Bll) 
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with 



APPENDIX D: HAMILTONIAN SAMPLING 



We can then define 



<5° bs = wS v + e v . 



from which following relation holds to linear order 

robs,z £°bs i robs 



(B12) 
(B13) 
(B14) 



APPENDIX C: LIKELIHOODS 

Here we present the likelihoods which can be used in our 
framework slight ly modified with respect to the ones derived in 
ItCitaura et alj d20 lOh as we want to estimate the signal s = ln(l + 
<5m) - /V 



CO. 4 Poissonian/Gamma likelihood 

The Poissonian distrib ution function (or Gamma function, see 
iKitaura & Enfflinl2008l . and references there-in) considers the noise 
to be independent from cell to cell and can includ e a binomial 
model for the completeness (see lKitaura et alj|2009f) . In this way 
the noise is proportional to the square root of the completeness 
in each cell. Note however that the peculiar velocities will in- 
troduce a correlation which we are treating separately within the 
Gibbs-sampling scheme (see ^4. 1.1b. The log-likel ihood with mean 
Xi — Wip(l + 8m, i) reads (see lKitaura et aljboifjf) : 

-ln£(<f|s)= (CI) 

S~] pun exp(si + p 3 ) — pi ln(Nwi exp(si + p s )) + T(l + pi) , 



with its gradient being: 

dlnC(d\s) 



dsi 



= pwi exp(s; + p, s ) — pi 



(C2) 



Please note, that the density in each cell pi (number counts in the 
discrete case) is a particular realization of the mean Xi and p repre- 
sents the mean density (mean number counts in the discrete case). 
The Gamma function is replaced by a factorial in the discrete case. 
Nevertheless, we show below (see Sec.|Dj that this term has not to 
be computed. The relation between the density p and the observed 
overd ensity 8m s is given by <5jJ,1 — Pi/~P ~ w i (see lKitaura et al.1 
120091) . 

CO. 5 Gaussian likelihood 

If one wants to include specific errors in each cell or even a com- 
plex correlation in the noise covariance matrix N one can use the 
Gaussian likelihood. The log-likelihood for a Gaussian distribution 
is given by 

- In C{d\s) = iln ((271-)^ det(N)) + ^N^e , (C3) 

with the noise being defined as: e = R<5m — d. The quantity which 
needs to be defined is the noise covariance matrix which in its sim- 
plest form is given by a diagonal matrix: Ntj = crfS^j, i. e. the 
effective variance in each cell: of. The variance in such a model 
should be computed summing up the contributions of all the errors 
in the ID reconstruct ion corresponding to a given cell. We refer 
to lPichonetallfeOOlh to model a more complex noise covariance 
matrix. 



In this appendix we revise the Hamiltonian sampling approach used 
to sample the conditional matter field and point out the difficulty 
in its direct use to power-spectrum estimation which justifies the 
approach presented in 34.1.11 Please also n ote, that the require d 
expressions sligh t ly cha nge with respect to IKitaura et alj J2010h : 
IJasche & Ki taura (2010) as we want to extract here the signal s = 
ln(l + <5m) — Ms ( see appendix lAl). 

The product of the prior and the likelihood is proportional to 
the posterior distribution function by Bayes theorem: 



P{a\d,c) 



P(s\c)C(d\s) 



(Dl) 



JdsP(s\c)C(d\s) ' 

where the normalization is the so-called evidence. For many ap- 
plications one can ignore the evidence. This is the case when one 
wants to find the maximum a posteriori solution or when one wants 
to sample the full posterior with a fixed power-spectrum. For all 
these cases the evidence can be considered just a constant. 

Let us define the so-called potential energy from the posterior 
distribution P(s\d, c): 

E(s) = -\n(P(s\d,c)) . (D2) 

For a lognormal prior and an arbitrary likelihood we have: 



E(s) = -In ( (2n)' 



' det(S) 



(D3) 



+ ^s T S 



tc-i. 



In (£ (<f|s)) + In / ds P(s\c)C(d\s 



Please note that the terms including the determinant of the correla- 
tion function and the evidence do not depend on particular realiza- 
tions of the signal but on an ensemble of signals. This permits us 
to consider them as constants when calculating its gradient. Hence, 
the gradient of the potential energy function with respect to the sig- 
nal is given by: 

dE{s) ^ „-i d 



dsi 



-ln(£ (d|s)) 



(D4) 



The full posterior distribution can be obtained by sampling with 
a Markov Chain Monte Carlo method. Given an analytic expres- 
sion for the posterior, from which appropriate derivatives can 
be calculated, one can use Dynamical and Hybrid Monte Carlo 
metho ds to sample its whole distribution (see the review bv lNeall 
1 19931). In partic u lar the Hamiltonian sampling method introduced 
by Duane et"al] (1987) has been proved to efficiently deal with 
linear and nonlinear problems in high d i mensional spacejf l (see 
iTavlor et al-Eoolljasche & Kitaurall2010l : ljasche et alj20ld) . The 
Hamiltonian sampling is a stochastic dynamics method based on a 
fhermodynamical analogy. In this model a force defined by the gra- 
dient of the potential energy with respect to the system's configu- 
ration coordinates changes its configuration through the momenta. 
The system visits different configuration states with a frequency 
given by its canonical distribution when it is exposed to a heat bath. 
This sampling process can be modeled through random realizations 
within the Hamiltonian scheme. 

Let us define the Hamiltonian total energy function: 



H(s,p) = K( P ) + E(s) 



(D5) 



4 Each cell i implies a statistical dimension. The problem we face here has 
therefore N c dimensions. 
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with a kinetic energy term constructed on the nuisance parameters 
given by the momenta p and mass variance M: 



K(p) 



m;. 



Pi 



(D6) 



Note that the potential energy E(s) was already defined in 
Eq. dD2t . 

Let us use the following compact notation: P(p) = P(p\M), 
P(s) = P(s\d,c) and P(s,p) = P(s, c, M). The canon- 
ical distribution function defined by the Hamiltonian (or the joint 
distribution function of the signal and momenta) is then given by: 



P(s,p) = 



— exp(-H(s,p)) 



~~ ex P(- 

P(p)P(s) 



-K(p)) 



-^-exp(- 



-E{s)) 



(D7) 



with Zh, Zk and Ze being the partition functions so that the prob- 
ability distribution functions are normalized to one. In particular, 
the normalization of the Gaussian distribution for the momenta is 
represented by the kinetic partition function Zk- The Hamiltonian 
sampling technique does not require the terms which are indepen- 
dent of the configuration coordinates as we will show below. 

From Eq. ( ID7b it can be noticed that in case we have a method 
to sample from the joint distribution function P(s,p), marginaliz- 
ing over the momenta we can in fact, sample the posterior P(s). 

The Hamiltonian dynamics provides such a method. We can 
define a dynamics on phase-space with the introduction of a time 
parameter t. The Hamiltonian equations of motion are given by: 



dsi 
~~dt 

dpi 
dt 



dH _ 

dpi 

dH 

dsi 



_ dE(s) 
dsi 



(D8) 



(D9) 



To sample the posterior one has to solve these equations for ran- 
domly drawn momenta according to the kinetic term defined by 
Eq. dD6l >. This is done by drawing Gaussian samples with a vari- 
ance given by the mass VI which can tune the efficiency of the 
sampler (see Jasche & Kita ural2O10l) . The marginalization over the 
momenta occurs by drawing new momenta for each Hamiltonian 
step disregarding the ones of the previous step. 

It is not possible to follow the dynamics exactly, as one has 
to use a discretized version of the equations of motion. It is conve- 
nient to use the leapfrog scheme which has the properties of being 
?ime-reversible and conserve phase-space volume being necessary 
conditions to ensure ergodicity: 



e\ , . e 8E(s) 

t+ 2) = Mt) -2^s7 



»<(*) 



Si(t + e) = Si(t) +e^Af lJ 1 p J (t + 



(D10) 
(Dll) 



sampling step. Although the Hamiltonian equations of motion are 
energy conserving, our approximate solution is not. Moreover, the 
starting guess will not be drawn from the correct distribution and 
a burn-in phase will be needed. For these reasons a Metropolis- 
Hastings acceptance step has to be introduced in which the new 
phase-space state (s' , p') is accepted with probability: 



Pa = mm[l,exp(-(5H))] 



(D13) 



with 8H = H(s',p') - H(s,p). 

The Hamiltonian sampling technique cannot be easily ex- 
tended to jointly sample the density and the power-spectrum. As 
we can see from Eq. l |D13t the Hamiltonian sampler does not re- 
quire the evaluation of the full joint posterior H (s, p), but just the 
difference between two subsequent states SH. Such a calculation 
is simple when the cosmological parameters are fixed, but becomes 
extremely complicated when one samples them within the same 
Markov Chain. The reason being that calculation of the determi- 
nant of the covariance and the evidence would become necessary 
in each Hamiltonian step (see Eq. |D3b . This problem can be over- 
come with the Gibbs-sampling scheme given that the conditional 
probability distribution functions are sampled in a consistent way 
as we show in 34.1.11 



Pi (t + e) 



e dE{s) 
2 dsi 



(D12) 



S;(t+e) 

The dynamics of this system are followed for a period of time At, 
with a value of e small enough to give acceptable errors and for 
N T — Ar/e iterations. In practice e and N T are randomly drawn 
from a uniform distribution to avoid resonant trajectories (see lNeall 
1 19931) . 

The solution of the equations of motion will move the sys- 
tem from an initial state (s,p) to a final state (s' ,p') after each 
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